DOCVIER ISSDII 



2D 126 639 

lOTBOB 
TITLE 

PDB DITE 
HCTE 



IC 090 527 



BDBS PBICE 
DESCBIPTOBS 



SchaelkiOf liora 

Statistical Power Ui*alysis of Be^earch in 
"Exceptional Children." 
Ipr 76 

12p.; Paper presented at the Innoal iDternatioDal 
CoBTention^ The Council for Exceptional children 
(Sathr Chicago^ Illinois^ Ipril n-S^^ 1976) 

HP-$0.83 BC-$1.67 Plus Postage. 

Educational Sesearch; Evaluation Hethods; Exceptional 
Child fiesearch; *Handicapped;, ♦Besearch Hethodology; 
♦Statistical Analysis 
IDEHTIPIEBS ♦Power itialysis 

IBSTEiCT ' ' ' . 

k statistical power analysis of research studies in 
speciSl education reported in Toluies 39 and ao of Exceptional 
Children (1972-73 and 1973-74) vas conducted. The ^asic concepts of a 4 
power ^alysis (Type I and Type II errors ^ and conventional effect 
sizes) were reviewed, and the studies evaluated for statistical 
power. Using the .05 level of significance, the average power to. 
detect siall effects was .11 (5X having a better than 50-50 chance to 
declare-signif icant findings), for lediui effects the average power 
was .49 (43% having a better than 50-50 chance), and for large 
effects the average power was .82 (76* having a better than 50-50 
chance). There a"re several approaches to increasing power, including 
increasing the number of Ss, increasing the level of alpha, alteting 
the research design, using directional tiests, and using highly ' 
reliable weasures. Statistical power analysis should becoie a part of 
research training and one of the criteria for the evaluation of 
research reports. (iuthor/IB) 



V 



/ 



Documents acquired by ZfilC include many inforial unpublished ♦ 

♦ materials not available from othef sources. EEIC .make? every effort 



♦ to obtain the be^t copy available, levertheless, items of marginal • 

♦ ,reprod4icibility are often encountered and this affects the quality 

♦ of the microfiche and hardcopy reproductions BBIC makes available 

♦ via the EBIC Document Beprgdoction Service (EDBS) . IDBS is not 

♦ responsible for the quality of the original document. Beproductionn 
.♦ supplied by BDBS«are the best that can be made «|rcm the originals * 

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦##^######4>«4>###«t###########t^4rttt^^^ 



S'tatistical Power Analysis of 
Research in Exceptional Children 



us OC^AtTMCNT 0^ Mf ALTm 
tOUCATlON 4 WCL^AtC 
WATlOMAL INSTlTUTfOF 
f OUCATIOW 

tn s c>60jvent mas been UEPRO 
dZceo exacil^- as JJECEivEO prov 
TmE PE9S0N OR ORCANtZATiON ORidN-- 
AT.*<c it PO'NTSOP V'E A OR OPINIONS 
STfTEOOO NOT NECESSARILY REPRE 
SENTOV^'*'- NATrONAL 'NSTITU'E 
E&uCAtiON POSiTiOHOR PO<.iCv 



Liora Schmelkiiy'' 

Department of Educational Psychology 
Kew York University 
933 Shimkin ^11 
New York, N^T. 10003 

Position: Doctoral Candidate » 




Sohmelkin 



Abstract- 

In recent years some behavioral sci^?t^st« haye> attempted alert and ^ 
sensipize researchers to the iint>oj*^t distinct;Lon between stati^sttLcal sig- 
nificance^and meaningfulhe^ of findings in behavioral research. Moreover, ^ ' 
they have ^tempted to^ impress- researchers with the need to cpnsider magnitudes^^ 
of effects 3iid statistical power in the design of their studies. Despite these 
attempts, it appears that the vast majority of studies in the social sciences 
are planned^ executed, aaci reported without any concern with issues ofi^siibstan-* 
tive meaningfulness ajid the statistical power of the tests being used* 

The present paper was devoted to a statistical power analysis of research 
studies in special education reported in Volumes 39 and 40 of Exceptional 
Children (1972-73 , 1973-74) . After reviewing the basic concepts involved 
in isuch an analysis, namely Type"! and Type II errors, and conventional effect 
sizes (i.e., small, meditmu, and large), the publisTied research was scrutinized 
for statistical power. It was found that the average power to detect small 
effects was .dl; with only 5% of the tests having 4 jDetter than 50-50 chance to 
declare, the findings as being significant at the .05 level of significance. ^For 
medium effects,, the average power was .49 with 43% of the tests having a better 
than 50-50 chance to declare th^ findings as being significant. For^ large 
effects, t;he average power was .82 wi'th 76% having a better than 50-50 chance. 

The paper "cohcludes wi^th a summary and recommendations for making power 
analysis an integral part of the research endeavor. 



ERIC 



SchfoeU^in 



Statistical Power Analysis of . - < • 

Research 'in Exceptional Children 

Despite the 'ifen^rpversy surrounding te^ts of significance , and despite 

attempts to al^rt researchers to the need to Tnterpret results substantively, 

findings are si^bill reported almost exclusively in terms 6f significance. As 

is well known, given ^ large enough sample, any finding can be declared 

statistically significant. Consequently, if is 'important to distinguish 

between results that are statistically significant bu^ not substantively 

t • 

important artd results that are important but that are declared riot significant 

' ^ ** 

because of low power in the statistical test' used. 

Judging, fromipubli shed research findings, most behavioral researchers 
seem \o be ^irfctjerylinaware or not fconcemed with the important role of .statistical 
power 'analysis in the design o'f reselarch (Brewer, 1972; -Cohen, 1962; Hopkins, 
1973.). .-^his frequently leads to siti3fefcion$^ inawhich results that are sub- 
stantively not meaningful ai% declared to be statistically significant, or 
ones in^which meaningful results bx^ declared to be statistically nonsignificant. 

Either state of affairs is an unhappy one; the first being trivial, while the 

* • - 

second is -fraught with ambiguities or perplexities. ' "It is unfortunate that 
failure tij confirm hypotheses has become equated with experi^igntal failure" 
(Millei^ Knapp, 1971, p: 7). " 

Of the four factors that affect the flower '*of statistical tests (i.e., 
effect size. Type I error. Type-. II error, numbej: of subjects)., the most impor- 
tant, but the one most often* overloolced,>t«.^ the aritici^pated' effect size ^S). 
'The ES is "the degree to which tbe phenomenon is pr^^ent^n the pppulation or 
the degree to which the null hypothesis is false" (Cohen/ J^Ss/^^T-^^'^iO) . 
The point of departure, in designing rtesearch should be the determination of 



> 



Schmelkin 

the -magnitude of the expected effect. Since it is rarely the case that there 

are no differences between groups on any variable and since any difference, no 

matter how small, can be foui*d to be significant if a large enough number of - 

subjects (N) is used, it is incumbent on the researcher to specify what 

difference he will consider meaningful and to design his research so that. 

when in fact a meaningful finding is obtained the probability of declaring it 

to be significant is high. The determination of the minimum ES to^ be considered 

meaningful is based, among other tHinq^s, on the nature of the research ia the 

\ * • 

area under investigation, the investment of effort and money, and the • consequences 
of rejecting the null hypothesis. For example, in. research on the differential 
'effects of different remediation programs the magnitude of the ES considered • 
meaningful depends upon the relative efforts and costs involved in impleSltiting ' 
each of the programs; the possible impact each program may have injthe area 
under study, as well as on related areas. While the decision about 'the desired ' 
ES in a given stucj^j^ is best made on the basis of theoretical and practical 
considerations, it frequently happens that the researcher does not have the 
information necessary for a m^etningful decision. . Under such circumstances^ 
one may resort to . conventional criteria for ES. Cohen (1962, 1969)/ for example, 
proposes that for an analysis of difference^ between groups, mea»* differences 
of one-quarter, one-half and one standard deviation be~~considered, small # 
medium and large effects respectively. 

As is known, the four factors af fecjbing statistical power are interrelated, 
and the selection of any three of them detAmines the fourth. Most researchers 
seem to adopt the conventional levels of significance (e.g*, #05 or .01) and 
conduct their study with whatever number of subjects is available to them. 
Whatsis called for, instead, is to specify, in ad^litiOn to ^ the ES and the 



/^chmelkin 
4 ' 

desired power of the test ti.e., i-^) . "The seriousness of*potential error 
determines how much power is necessary" (Miller & Knapp, 1971, ^. ^8): In 
the evTBnt one is unable to make such a determination, it has been suggested 
that^ be set equal to .20, so that the probability of rejecting a false null 
hypothesis for the ES selected is 80% (Cohen, 1969). Having selected the 
three factors mentioned above it is^ possible to calculate the sample size 
necessary. 

In an attempt to alert researchers to these problems, several surveys 
have been conducted that have tried to ascertain the power of representative 
studies in the areas of psychology and education (Brewer, 1972; Cohen, 1962). 
The present investigation was designed to study the statistical power of 
studies in the area of special education. .Specifically, Volumes 3^ and 40 
of Exceptional Children were reviewed and the poweV each had of detecting 
' small, medium, and large ES was calculatedr^TTThe-^ndings were then compared 
to those reported by Cohen (1962) and Brewer il972) . 

Procedures — 

f^or purposes of comparison, ,it was necessary to impose standard conditions 

on the assessment. Since there was no evidence to indicate that a level of 

" significance was set prior to the data .collection in 'any of the articles, the 

.05 level of significance was .used uniformly as has been done in the surveys 

'mentioned Above. In addition, nondirectionality of hypotheses was assume<? 
« 

J, throughout. In cases where total N's were reported without a breakdown, i^t 
was assumed that t,here were equal n"s in each groups This procedure enables 

• • • <• ^ y ■ 

olrje to detect the maximum power available undjBat^.5iS(p^imfil conditions. Since 
- no mention of ES was thide in any of the articiiw'Vtoft^ operational 

definitions of small, Tnedium, and large effeott slips' fof each test were ' 

/ * ■ <■ 



ERIC 



. . - , Schmelkin 

5 ' 

selected following Cohen (1969); No attempt was n^de to consider other 
problems in research deSign. * * • 

' The three statistical tests most prevalent in the two volumes under review 
ate t tests, analyses of variance (F tests), and correlational analyses (r).*^ ' 
Siride other tests (e.g., sign tests) occurred pnly once or 'twice/ this survey 
is limited to the above tl^ree. Of all the articles that used 'statistical 
tests > 35 contained enough information to calculate the power of the tests for 
the three ES. The remaining articles either used tests other than ^the three 
under study or did not provide sufficient information. This was primarily due 
tfi two ommissions: (a)' no specification as to the exact nature of the test 
used, and (b) insufficient information as to the number of subjects. It should 
be note4?that this state of affairs not only prohibits post hoc power analysis, 
it does not permit one to adequately interpret the results of *the studies in 
question. 

Of the .35 articles, ".5 employed 2 types of tests. When more than one 
type of telSt was used, separatee power analysis was conducted for each. 
Median povjer for each of the dilstinct statistical tests in a given article was 
used in the tabulation of the results. - While other surveys report mean's 
instead of medians, it is felt that the latter is preferable since it^is not 

affected by extreme cases. . . ' 

If ^ * , ^ 

Results and Discussion 

7^ ^ . ^ ' ^ 

T^e.powe^r distributions for the ^r^^f and r analyses at'e presented in 
Table 1. The results can be summarized as follows: The average power for ' 
small effects across the three types of tests- used vr^s .11 with only 5% of 
the tests having a better than 50-50' chance of detecting such effects. For ' 
medium efifects the average^pc^r Was .49 with ^43% having a better than 50-50 
chance. For large effects, the average power was .82 with having a better 



Schmelkin- 
6 



,than 50-50 chance of detecting effects of this^tMn^tu^ 

e surveys 



by Cohen in the 



When comparing tha present' survey with the surveys 
Journal of Abnormal and Social Psychology (1962) and byjferewer in the American 
Educational Research Journal (1972), it should be kept in? mind that In 
each of the surveys the problem was approached in slightly different ways. 
Cohen's analysis combined the different tests into one grouping which yielded 
a power. index for each article, regardless of the analyses involved. The 

c 

>^ median power findings of his Burvey were .17, .46, and .89 for sm^ll, medium, and 
large effects iTespectively . Brewer , presented the data Classified into the 
various statistical tests he examined ' (F, t, and r) . In contrast with th^ 
present survey, he analyzed thQ data without an article breaJfdown. Consequently, 
there is no way of knowing the contribution" of each carticle to the overall 
index reported. -^The combined average power of Brewer's survey was .14, .58, 
and .78 for small, medium, and large ES respectively. Although direct compari- 
son cannot be made because of the somewhat different procedures, it is 
interesting to note that while the three surveys deal with different jsontent 
areas,. their findings are generally very similar. The findings of these 
surveys do not portray an encouraging situation. On the average, only when 
large effects were being studied, do the research articles have adequate power. 
It .should be noted, however, -tfiat large effect sizes are not generally 
encountered in behavioral research. What is mord important, how^ver^ is that 
the jower of the test in the studie's reviewed was not determined by design 
but J^ither by default. Moreover, this picture is probably favorably biasfed 
due tQ the^»le6tivity in accepting for publication articles that find signifi- 
cant res\|rfts. Thus, in research in general, power is -probcdDly even lower than 
indicated in survlt^a suchl as this. ^ * - 



Summary and Recommendations - 

There are several approaches to increasing power: 

-* 

(1) The, most obvious immediate remedy to the problem of low power is 

'* 

to increase the number of sxobjects. Other factors held constant, this will# 

• ' \ , " • / 

result in increased power.' ^ ^ 

(2) Increasing the level of alpha Vill result in more'power, however, 
"alpha 'should not be set thoughtlessly, but should reflect a balance in 
Type I-Type II error considerations" (Hopkins, 1973, p. 106). 

(3) Research may be designed to increase the size of the effect under 
study rather than passively attempting to detect whatever effect is obtained, 
regardless of how small the effect is (Cohen, 1973). 

(4) Other things being equal, a test of a directional hypothesis has more 
power than a nondirectional one. Directional testg, however-, ^ould be used 
judiciously (Cohen, 1969): *' , ^ 

(5) Another impdrtant aspect is the reli^ability of the. measures. As 
currently used, power analysis for the most part assumes high reliability. 
To the extent that the measures ^are unreliable, power will be less than that 
expected under optimal condition^ (Cleary & Linn, 1969). Needless to say, 
high reliabilities are not the poirm in behavioral research. This would lend- 
furtber support to the assertion that in reality power ii^ probably lower , 
than what was found to be in these surveys. - . \ 

In sum, the present survey indicated^ serious shortcoming in the design 
of research in special education. In our continued efforts to upgrade sucli 
research it is important .that considerations of statistical power analysis 
become an integral part of the training of researchers in our field, as w«ll 
as ope of the criteria for the evaluation of research reports submitted for 
publication. * \ ' 



^ Schm6lkin 
8 



References 

Brewer, J. K.' On the power of statistical tests^ in the American EducationStl 

Research Journal > American Educational Research Journal , 197.2, 9, 391-40 1 . 

Cleary, T. A., & Linn, R.^ L. Errors of measurement and the power of a ^statistical 
test. British Journal of Mathematical and Statistical Psychology , 1969, 
22, 49-5?; '^^ ~ 

Cohen^ J. The statistical power of abnormal-social psycho loj^Lcal research: 
a review* Journal of Abnormal and Social Psychology , 1962, 65, 145-153. 

Cohen, J. Statistical power analysis for the behavioral Sciences . New York: 
Academic Press, 1969. 

Cohen^ J. Statistical power analysi^s .cuid research results. American Eaucational 
Research Journal , 1973, 1£, 225-230. • ' ^ 

Hopkins,, K. Preventing the number-one misintei^retations of behavioral research, 
or haw to increase statistical power. JourVial of Special Education , .1973 # 
7, 103-107. ' ^ ^ ^ ' 

Mil3>er, J,, & Knapp, T. The importance of statistical power in educational 
research . Bloomington, Ind. : Phi Delta Kappa; '1971. 



Schmelkin' 
9 



Footnote 



While r is not a test of significance but a measure of associatibn, it 
w^s decided to treat it in a separate category for the purpose distinguisjtiing 
studies reporting ^rrelations from thoSe focusing on mean dif ferefice^> as 
well as for comparisons with other surveya available in the literature. 



